Entry Name:  “OvGU-Held-MC2

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Pascal Held, Otto von Guericke University Magdeburg, Germany, pascal.held@ovgu.de  PRIMARY

Christian Braune, Otto von Guericke University Magdeburg, Germany, christian.braune@ovgu.de

Rudolf Kruse, Otto von Guericke University Magdeburg, Germany, rudolf.kruse@ovgu.de

 

 

Student Team:  NO

 

Did you use data from both mini-challenges?  YES

 

Analytic Tools Used:

Self-developed scripts to make analysis and visualization.

Python

Matplotlib

NumPy / SciPy

 

Approximately how many hours were spent working on this submission in total?

60h

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

Video:

OvGU-Held-MC2.wmv

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.         Characterize the communication patterns you see.

      b.        Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

 

There are several guests with a lot of communication, but two of them have extreme high communication frequencies. The user with the ID 1278894 send 190360 messages to 2521 other guests. The user with the ID 839736 send 60812 messages to 8720 other guests. The next active persons send about 3000-4000 messages to 500 to 600 peers. Due to this large gap, we will focus on the two high frequency users.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:activity_1278894.pngBeschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:no_receivers_1278894.png

 

As it is shown in the figures above, the user 1278894 will send messages very regularly. A deeper look into the data shows, that the messages are sent in exactly 5 minutes intervals for one hour followed by a one hour pause from 12 to 21 o’clock. All messages are sent to almost the same guests during the three days. Around 15 o’clock occurs a small decrease of receivers caused by leaving people. The same thing happens in the evening hours. We assume that the number 1278894 is something like a park information system. This system is like a newsletter that makes some announcements about shows and events in the park. Guests have to subscribe this service, so not every guest receives it.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:activity_839736.pngBeschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:no_receivers_839736.png

 

The number 839736 differs a lot from this behavior. Messages are sent really all over the day, and in almost every second. Also the messages are sent to only to a small group of people. In most cases there are less then five receivers per second. In total this UID sends less messages to more people then the park information system. We assume that this is something like a support team, where guests can ask individual questions. One remarkable abnormality is on Sunday at 12 o’clock. A really high amount of communication is recorded, probably reports of the crime from the guests.

 

 

MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

 

Limit your response to no more than 10 images and 1000 words.

 

 

To characterize communication patterns, we focus on the number of send and received messages and on the number of peers messages are sent to and received from for each user. We excluded all communication to the personal assistance service (839736), the park information system (1278894), and the external receipients. From this data we classified the users in characteristic groups. For this characterization we used a message vs. number of peers plot for send and received messages. Also we only focus on the people that are communicating with the app during the weekend. Only the first fitting rule will be applied to classify the guests.

 

 Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:send_1.pngBeschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:received_1.png

 

First we group all people sending messages to at most 25 other people (blue, [1]). This groups all the standard guests sending messages to a small peer group only. 6516 people fit this pattern.

 

Next we focus on the group marked with green [2]. They have a lot of peers but are sending at most 3 messages per peer. Maybe these are people who mingle with other people, and just exchange some contact information to stay in contact after this weekend. 1122 people fit to this pattern.

 

The next anomaly is the small red marked group [3] in the receiving plot. This is a closed group of 37 people, receiving at least 20 messages from every peer. The group represents exactly one peer group. So the members of this group are only communicating to all others of this group.

 

The cyan colored people [4] are the next visible group. They have about 20 peers they send messages to and at most 25. With 1155 guests fitting this pattern it is also a large group.

 

The other people are not clear to distinguish, but there is a significant amount of people receiving less then 400 messages (magenta, [5]). These are 281 persons.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:send_split.png

 

To distinguish the not yet classified people we used a histogram of sent messages. We fit two normal distributions and get as a possible split threshold at approximately 1500 sent messages. So we split the remaining guests into two groups, one group of at most 1500 messages sent (yellow [6]) with 1958 guests and the extreme high frequency group (black, [7]) with more then 1500 messages sent.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:send_2.pngBeschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:received_2.png

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:no_per_cluster.png

Our clustering shows seven communication patterns. In addition we have the two different patterns from the park information system and the personal assistance service described in MC2.1. There are also 1944 users that did not communicate at all which could be seen as a pattern, too.

 

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:send_recv_cluster.png

 

In the figure above, we show the distribution of sent and received messages per cluster. The Clusters 3, 6, and 7 are the most active clusters, while in the cluster 1 most people did not send at all. In Cluster 3 almost all people receive the same amount of messages, while the sending amount differs between the guests. This indicates, that commonly the messages are sent to all peers in the group.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:in_out_degree_cluster.png

 

When we look at the number of peers the clusters 2, 6, and 7 are the most active ones. Cluster 2 has also a wide range of peers per user. Cluster 3, 4 and 5 have a similar amount of peers, which looks like normal social behavior for small groups like school classes. Cluster 3 has no variance, which is based on the strong closed group.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:ratios_cluster.png

 

The next thing we investigate is the ratios of number of messages send to the sum of messages send and received. The Clusters 2, 3, 4, 6, and 7 have similar distribution for more and less sending people. Also the average is almost 0.5 for these clusters. In contrast to this, Cluster 1 contains a lot of guests that send fewer messages then they receive. Cluster 5 is a very active cluster. People send more messages then they receive.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:to_extern.png

Our last point of interest is the communication to the special receiver “extern”. These messages could be some “share with friends”, e.g. some pictures from roller coasters. The graphic shows, which fraction of people send messages to extern. Guests in cluster 1, 3, and 5 did not send any message to extern. In Cluster 2 and 4 some people, one or two out of thousand, send messages to extern. Most of the external communication is done from cluster 6 and 7, where about six out of thousand people send messages to extern. There are no messages that have extern as sender.

 

 

 

MC2.3From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

 

Limit your response to no more than 3 images and 300 words. 

 

First of all, we suppose that anomalies in the park are reasons to write messages to friends to talk about the happening or to the personal assistance service to report issues.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:communication-distribution.png

So, we checked the number of send message separated by day and land. Peaks in the histograms mean that there is a higher communication frequency during this time. There is a high peak in Coaster Alley on Friday at 16 o’clock. Also on Saturday and Sunday there are similar peaks in Coaster Alley, but the most significant peak is on Sunday from 11:30 to 12:30 in Wet Land and corresponding to that at 12 o’clock in the Entry Corridor. We should mention that the personal assistance service is located in the Entry Corridor. As the crime contains a stolen Olympic medal, which should be exhibited in the Craighton Pavilion in Wet Land, this seems to be related to the crime.

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:dist_day2_11-30.png

 

The figures above show an average message distribution of the last 60 seconds before the given time. We enriched the message data with the last position of the sender to locate the message more accurately. As you can see, there is a high amount of messages, which are from the exhibition hall at 11:30. Maybe other guests detected the missing medal and reported this to friends and the park administration.

 

Beschreibung: Macintosh HD:Users:pheld:Repos:vast-challenge-2015:data:provided:Submissions:OvGU-Held-MC2:pics:dist_day2_12-0.png

 

Later, at 12 o’clock, there is a hectic communication all over the park. Maybe the park administration informs all guests about the crime, or maybe the police arrived and they give instructions via the personal assistance service.